0ddccc60e vs cfa1a2c6b - NVFuser codegen diff

0ddccc60e
0ddccc60e change to cudaDeviceGetAttribute for clock & memory rate (#4241) [browse]
Liqiang Lu <116412316+liqiangxl@users.noreply.github.com>
Fri Apr 11 15:04:06 2025 -0400

cfa1a2c6b
cfa1a2c6b temp [browse]
Naoya Maruyama <nmaruyama@nvidia.com>
Fri Apr 11 16:15:08 2025 -0700

Command: build/test_nvfuser --gtest_filter=CombinedSchedulerTest.*
GPUs:
['NVIDIA H100 80GB HBM3\n', 'NVIDIA H100 80GB HBM3\n', 'NVIDIA H100 80GB HBM3\n', 'NVIDIA H100 80GB HBM3\n', 'NVIDIA H100 80GB HBM3\n', 'NVIDIA H100 80GB HBM3\n', 'NVIDIA H100 80GB HBM3\n', 'NVIDIA H100 80GB HBM3\n']
matches between runs
matches between runs
matches between runs

Test Diffs

1: CombinedSchedulerTest.LayerNormBackward/dtype_double_batch_216_hidden_32
  Kernel 1    -14 +14index type: int registers: 54 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0

2: CombinedSchedulerTest.LayerNormBackward/dtype_double_batch_216_hidden_96
  Kernel 1    -14 +14index type: int registers: 54 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0

3: CombinedSchedulerTest.LayerNormBackward/dtype_double_batch_216_hidden_576
  Kernel 1    -14 +14index type: int registers: 72 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0

4: CombinedSchedulerTest.LayerNormBackward/dtype_double_batch_216_hidden_768
  Kernel 1    -14 +14index type: int registers: 72 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0

5: CombinedSchedulerTest.LayerNormBackward/dtype_double_batch_216_hidden_1024
  Kernel 1    -14 +14index type: int registers: 72 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0

6: CombinedSchedulerTest.LayerNormBackward/dtype_double_batch_216_hidden_65536
  Kernel 3    -9 +9index type: int registers: 60 gmem: 3 static smem: 16 stack frame: 0 spill stores: 0 spill loads: 0

7: CombinedSchedulerTest.LayerNormBackward/dtype_float_batch_216_hidden_32
  Kernel 1    -14 +14index type: int registers: 48 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0

8: CombinedSchedulerTest.LayerNormBackward/dtype_float_batch_216_hidden_96
  Kernel 1    -14 +14index type: int registers: 48 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0

9: CombinedSchedulerTest.LayerNormBackward/dtype_float_batch_216_hidden_576
  Kernel 1    -14 +14index type: int registers: 48 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0

10: CombinedSchedulerTest.LayerNormBackward/dtype_float_batch_216_hidden_768
  Kernel 1    -14 +14index type: int registers: 47→ 48 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0

11: CombinedSchedulerTest.LayerNormBackward/dtype_float_batch_216_hidden_1024
  Kernel 1    -14 +14index type: int registers: 47→ 48 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0

12: CombinedSchedulerTest.LayerNormBackward/dtype_float_batch_216_hidden_65536
  Kernel 3    -9 +9index type: int registers: 40 gmem: 3 static smem: 16 stack frame: 0 spill stores: 0 spill loads: 0

13: CombinedSchedulerTest.LayerNormBackward/dtype___half_batch_216_hidden_32
  Kernel 1    -10 +10index type: int registers: 64 gmem: 3 static smem: 16 stack frame: 0 spill stores: 0 spill loads: 0

14: CombinedSchedulerTest.LayerNormBackward/dtype___half_batch_216_hidden_96
  Kernel 1    -10 +10index type: int registers: 64 gmem: 3 static smem: 16 stack frame: 0 spill stores: 0 spill loads: 0

15: CombinedSchedulerTest.LayerNormBackward/dtype___half_batch_216_hidden_576
  Kernel 1    -10 +10index type: int registers: 64 gmem: 3 static smem: 16 stack frame: 0 spill stores: 0 spill loads: 0

16: CombinedSchedulerTest.LayerNormBackward/dtype___half_batch_216_hidden_768
  Kernel 1    -10 +10index type: int registers: 60→ 56 gmem: 3 static smem: 16 stack frame: 0 spill stores: 0 spill loads: 0

17: CombinedSchedulerTest.LayerNormBackward/dtype___half_batch_216_hidden_1024
  Kernel 1    -10 +10index type: int registers: 60→ 56 gmem: 3 static smem: 16 stack frame: 0 spill stores: 0 spill loads: 0

18: CombinedSchedulerTest.LayerNormBackward/dtype___half_batch_216_hidden_65536
  Kernel 3    -9 +9index type: int registers: 40 gmem: 3 static smem: 16 stack frame: 0 spill stores: 0 spill loads: 0

19: CombinedSchedulerTest.LayerNormBackward/dtype___bfloat_batch_216_hidden_32
  Kernel 1    -10 +10index type: int registers: 64 gmem: 3 static smem: 16 stack frame: 0 spill stores: 0 spill loads: 0

20: CombinedSchedulerTest.LayerNormBackward/dtype___bfloat_batch_216_hidden_96
  Kernel 1    -10 +10index type: int registers: 64 gmem: 3 static smem: 16 stack frame: 0 spill stores: 0 spill loads: 0

21: CombinedSchedulerTest.LayerNormBackward/dtype___bfloat_batch_216_hidden_576
  Kernel 1    -10 +10index type: int registers: 64 gmem: 3 static smem: 16 stack frame: 0 spill stores: 0 spill loads: 0

22: CombinedSchedulerTest.LayerNormBackward/dtype___bfloat_batch_216_hidden_768
  Kernel 1    -10 +10index type: int registers: 60→ 56 gmem: 3 static smem: 16 stack frame: 0 spill stores: 0 spill loads: 0

23: CombinedSchedulerTest.LayerNormBackward/dtype___bfloat_batch_216_hidden_1024
  Kernel 1    -10 +10index type: int registers: 60→ 56 gmem: 3 static smem: 16 stack frame: 0 spill stores: 0 spill loads: 0

24: CombinedSchedulerTest.LayerNormBackward/dtype___bfloat_batch_216_hidden_65536
  Kernel 3    -9 +9index type: int registers: 40 gmem: 3 static smem: 16 stack frame: 0 spill stores: 0 spill loads: 0

25: CombinedSchedulerTest.SharedProducer
  Kernel 2    -16 +16index type: int registers: 48 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0
  Kernel 4    -14 +14index type: int registers: 48 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0
  Kernel 5    -14 +14index type: int registers: 56 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0
  Kernel 7    -14 +14index type: int registers: 48 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0

26: CombinedSchedulerTest.InnerOuterMismatch
  Kernel 1    -7 +7index type: int registers: 32 gmem: 3 static smem: 0 stack frame: 0 spill stores: 0 spill loads: 0